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Abstract 

Learning-based  model  predictive  control  (LBMPC)  is  a  technique  that  provides  de¬ 
terministic  guarantees  on  robustness,  while  statistical  identification  tools  are  used  to 
identify  richer  models  of  the  system  in  order  to  improve  performance.  This  techni¬ 
cal  note  provides  a  result  that  elucidates  the  reasons  for  the  choice  of  measurement 
model  used  with  LBMPC,  and  it  gives  proofs  concerning  the  stochastic  convergence 
of  LBMPC.  The  first  part  of  this  note  discusses  simultaneous  state  estimation  and 
statistical  identification  (or  learning)  of  unmodeled  dynamics,  for  dynamical  systems 
that  can  be  described  by  ordinary  differential  equations  (ODE’s).  The  second  part 
provides  proofs  concerning  the  epi-convergence  of  different  statistical  estimators  that 
can  be  used  with  the  LBMPC  technique.  In  particular,  we  prove  results  on  the  statisti¬ 
cal  properties  of  a  nonparametric  estimator  that  we  have  designed  to  have  the  correct 
deterministic  and  stochastic  properties  for  numerical  implementation  when  used  in 
conjunction  with  LBMPC. 


1  Introduction 

This  technical  note  is  meant  to  be  understood  in  the  context  of  [3],  and  it  consists  of  two 
distinct  parts.  Sections  2  and  3  concern  simultaneous  state  estimation  and  statistical  identi¬ 
fication  (or  learning)  of  unmodeled  dynamics,  for  dynamical  systems  that  can  be  described 
by  ordinary  differential  equations  (ODE’s).  The  second  part  is  found  in  Section  4  and  pro¬ 
vides  proofs  concerning  the  epi-convergence  of  different  statistical  estimators  that  can  be 
used  with  the  learning-based  model  predictive  control  (LBMPC)  technique. 

For  the  results  on  estimation  and  learning,  we  assume  that  for  state  vector  x  G  Mp, 
control  input  u  G  Mm,  and  output  i/  G  R?,  the  system  dynamics  are  given  by  the  following 
ODE: 

x  =  Acx  +  Bcu  + gc(x,u) 

y  =  Cx,  U 

where  Ac,Bc,C  are  matrices  of  appropriate  size  and  gc(x,u )  describes  the  unmodeled  (pos¬ 
sibly  nonlinear)  dynamics.  We  will  assume  that  the  control  inputs  generated  by  the  model 
predictive  control  (MPC)  schemes  are  piecewise  constant 

u(t)  =  un,  Vi  G  [nTu,  ( n  +  1  )TU),  (2) 
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where  Tu  is  the  sampling  period  of  the  input.  Note  that  appropriately  designed  MPC  can 
generate  other  control  schemes,  such  as  piecewise  linear  inputs. 


2  Limitations  on  Filtering  and  Learning 


We  begin  with  a  negative  result  about  the  inability  of  filters  to  estimate  both  the  state  and 
unmodeled  dynamics,  for  a  general  system  in  which  all  states  are  not  measured.  (This  result 
does  not  apply  to  systems  with  special  structure,  such  as  in  [2].)  This  limitation  applies  to 
situations  in  which  unmodeled  dynamics  are  described  by  a  series  expansion  with  constant 
terms,  and  so  it  is  relevant  to  a  wide  class  of  systems  and  filtering  approaches. 

Suppose  the  unmodeled  dynamics  are  parameterized  as  gc(x,  u )  =  y( x ,  u ;  9)  +  K,  where  K 
is  a  constant,  non-zero  vector  and  7(2;,  u\  9)  is  a  parametrized  function  such  that  7(2;, «;  9 0)  = 
0  for  some  parameter  value  9q.  We  again  note  that  this  includes  the  situation  in  which  gc  is 
given  by  a  series  expansion  (e.g.,  Taylor  polynomial,  Fourier  series,  etc.). 

The  intuition  is  that  statistical  identification  (or  learning)  of  the  parameters  9 ,  K  and 
estimation  of  the  state  x  can  be  cast  into  the  framework  of  observability  of  an  augmented 
dynamical  system.  The  augmented  system  has  y  =  Cx  and  dynamics 


X 

Acx  +  Bcu  +  7(2;,  u ;  9)  +  K 

K 

= 

0 

_9  _ 

0 

When  all  states  are  not  measured  and  there  is  no  special  structure  on  K,  then  this  augmented 
system  is  not  observable.  This  means  that  ( x ,  K,  A)  cannot  be  simultaneously  estimated  using 
measurements  of  the  system  output  y.  This  is  formalized  by  the  following  theorem. 

Theorem  1.  A  necessary  condition  for  the  observability  (and  detectability)  of  the  system 
given  in  (3)  with  y  =  Cx  is  that  rank(C)  =  p. 

Proof.  Suppose  9  =  9q,  which  makes  7 (x,w,9)  =  0.  Then  the  system  is  linear  and  time- 
invariant  (LTI).  Using  the  Popov-Bclevitch-Hautus  (PBH)  test,  the  system  is  observable  if 
and  only  if  rank(0)  =  p  +  p  =  2 p,  for  all  s  G  C  :  Re(s)  >  0,  where 


si  —  Ac 


0  = 


0 

C 


-I 

si 

0 


(4) 


If  s  =  0,  then  the  matrices  <f>  and  si  are  both  singular,  and  the  block  structure  of  0  implies 
that  rank(<^)  =  p  +  rank(C').  The  system  is  not  observable  (and  not  detectable)  when 
rank(C)  <  p,  establishing  necessity.  □ 

Remark.  This  result  also  applies  to  discrete  time  systems,  and  the  proof  is  nearly  identical. 

In  light  of  this  negative  result  concerning  filters,  we  require  that  C  be  full  rank.  Without 
loss  of  generality,  we  assume  that  the  full  state  x  is  measured. 
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3  Nonparametric  Filtering  for  Dynamical  Systems 

The  design  of  a  Kalman  filter  for  systems  with  unmodeled  dynamics  can  be  complex,  and 
so  we  propose  a  nonparametric  regression  approach  for  estimating  the  state.  Available 
approaches  include  local  polynomial  regression  (LPR)  or  spline-smoothing;  the  Savitzky- 
Golay  filter  [13]  is  technically  a  finite  impulse  response  (FIR)  filter  implementation  of  LPR. 
We  design  a  new  nonparametric  filter,  and  one  advantage  is  of  this  filter  is  that  it  is  easily 
computed  because  it  is  the  weighted  sums  of  measurements. 

An  important  point  to  note  is  that  the  statistical  guarantees  provided  by  our  filter  are 
not  the  same  as  for  a  Kalman  filter.  The  Kalman  filter  is  defined  to  be  consistent  if  its 
state  estimates  are  unbiased  and  the  true  error  covariance  is  smaller  (covariance  matrices 
are  positive  semi-definite,  and  so  a  partial  order  can  be  defined)  than  the  estimated  error 
covariance.  In  our  method,  consistency  is  defined  with  respect  to  the  sampling  period  Ts  of 
state  measurements.  As  Ts  — >  0,  the  estimates  converge  to  the  real  values  in  probability. 
This  philosophical  change  is  necessary  in  order  to  use  nonparametric  statistics,  otherwise  we 
would  be  forced  to  use  a  parametric  model  of  the  unmodeled  dynamics. 

We  begin  with  a  lemma  about  the  differentiability  of  the  state  trajectory  x{t)  when  the 
inputs  are  piecewise  constant. 

Lemma  1.  Suppose  gc(x,u)  is  Q— 1-times  differentiable.  Forn  G  Z,  the  trajectory  x(t)  which 
solves  the  ODE  in  (1)  is  once- differentiable  everywhere,  Q-times  differentiable  at  t  fz  nTu, 
and  not  twice- differentiable  at  t  =  nTu. 

Proof.  The  first  time-derivative  of  x{t)  is  given  by  (1),  by  definition.  Because  the  inputs 
are  piecewise  constant  (2),  the  input  u{t)  is  not  differentiable  at  t  =  nTu.  Because  the  first 
time-derivative  of  x{t)  is  a  function  of  u{t),  this  means  that  x{t)  is  not  twice-differentiable 
at  t  =  nTu.  Recall  that  u(t)  is  constant  for  t  nTu.  Thus,  u{t)  is  smooth  at  t  nTu. 
This  implies  that  x(t)  is  Q-times  differentiable  at  t  nTu ,  because  gc(x,  u)  is  Q  —  1-times 
differentiable.  □ 

Remark.  These  qualitative  features  mean  that  we  cannot  use  LPR  methods  with  order  higher 
than  zero  (i.e.,  the  Nadaraya- Watson  estimator)  without  modifying  the  filtering  scheme.  This 
is  an  important  point,  because  the  differentiability  of  the  trajectory  x{t)  makes  it  tempting 
to  use  LPR.  Yet,  no  theoretical  convergence  guarantees  can  currently  be  made  in  such  a 
situation,  and  the  behavior  of  these  filters  may  be  unpredictable. 

In  light  of  these  restrictions,  we  propose  a  modified  sampling  scheme.  Recall  that  Tu  is 
the  sampling  time  for  control  inputs,  and  we  define  Ts  to  be  the  sampling  time  for  state 
measurements.  We  require  that  kTs  =  Tu  for  some  k  G  Z+,  and  this  scheme  is  illustrated  in 
Fig.  1  for  the  case  of  k  =  4.  The  advantage  of  this  sampling  scheme  is  that  the  trajectory  x{t) 
is  piecewise  smooth  (infinitely  differentiable)  in  between  the  samples  taken  at  nTu,  because 
the  control  input  u{t)  is  piecewise  constant.  This  allows  us  to  use  LPR  of  order  higher  than 
zero  (e.g.,  local  linear  regression),  which  can  give  significant  improvements  in  estimation 
error  over  zeroth  order  LPR. 

If  the  trajectory  of  the  real  system  is  x{t ),  then  consider  a  measurement  model 

fi{jTs  +  nTu)  =  Xi(jTs  +  nTu )  +  ei:  j  G  Z+  :  jTs  G  [0,  Tu\,  (5) 
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Tu 

I - 1 

(□)□  □  □(§)□  □  □  (□)  □  □  □  (□)  □  □  □(§)□  □  □  (□)  □  □  □  © 

H 

Ts 

Figure  1:  We  use  a  sampling  scheme  with  two  sampling  periods.  The  inputs  change  at 
every  Tu  units  of  time,  and  the  states  are  measured  every  Ts  units  of  time.  In  this  example, 
kTs  =  Tu  with  k  =  4. 


where  e;  are  independent  and  identically  distributed  (i.i.d.)  random  variables  with  zero 
mean  and  bounded  values  <  [ej]M  <  s The  notation  [ej]M  indicates  the  /i-t.h  component 
of  the  i-th  noise  vector.  Suppose  that  we  have  made  measurements  for  n  —  0, . . . ,  N.  This 
measurement  model  corresponds  to  the  sampling  scheme  seen  in  Figure  1. 

3.1  Filter  Design 

Suppose  k(v)  is  a  kernel  function,  which  is  a  bounded  even  function  with  finite  support.  We 
will  use  A,  p  to  denote  left  and  right  differentiability,  and  r  is  the  polynomial  order  of  the 
filter.  Let  hx.n hp-n p  €  R.  be  bandwidth  parameters.  Next  we  define  a  diagonal  matrix  Rn ^ 
that  is  used  to  filter  to  the  right  side  of  the  i-th  entry  of  the  measurement  at  t  =  nTu ;  its 
entries  are  given  by 


Rn,i  =  diag{«;(0),  n{Ts/hx-n,i)y  •  •  • ,  ^(kTs/hx-n,i)} ■  (6) 

Similarly,  we  dehne  a  diagonal  matrix  Lnp  that  is  used  to  Liter  to  the  left  side  of  the  i-th 
entry  of  the  measurement  at  t  =  nTu: 


Ln.i  =  diag  {K{kTs/hp.r 


(7) 


Note  that  the  Rrht  matrix  uses  the  bandwidth  hx-np,  and  Ln ti  uses  bandwidth  hp-nji.  The 
reason  is  that  filtering  to  the  right  of  a  measurement  requires  left  differentiability,  while 
filtering  to  the  left  of  a  measurement  requires  right  differentiability.  Lastly,  we  dehne  the 
Vandermonde  matrix 

'l  0  ...  O' 

i  ts  ...  t r 

r  =  .  .  .  .  .  (8) 

1  kTs  ...  krTrs 

We  are  now  ready  to  design  the  filter.  The  filter  coefficients  are  given  by 


wn,i  =  e^T'L^T^T'Lr 
vn,i  =  e'^V  R^Ty'V  R,t 


(9) 


and  e\  is  the  unit-vector  with  a  1  in  the  first  position  and  zeros  everywhere  else.  The  idea 
is  that  Wnj  Liters  on  the  left  side  of  t  —  nTu  and  vn^  filters  on  the  right  side  of  t  —  nTu. 
As  time  advances  to  t  —  NTU,  we  hrst  filter  on  the  left  side  of  £(NTU)  (because  there  is 
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no  right  side).  At  the  next  point  in  time  t  =  (N  +  1  )TU  we  hlter  on  both  sides  of  £(NTU). 
Consequently,  the  hlter  is  time-varying. 

Let  the  number  within  the  angled  brackets  (•)  denote  the  (discrete)  time  at  which  the 
hlter  is  computed.  The  raw  state  estimates  (for  times  t  =  nTu,  for  n  —  0, . . . ,  N)  computed 
at  time  t  =  NTU  are  given  by 

[xN]i{N)  =  E.toKr-i  AMjTs  +  (N-  1  )TU) 

[xN-i\i{N)  =  1/2  •  [xN-i]i{N  -  1)  +  1/2  •  Y%=o[vN-i,i\j€i(jTs  +  nTu)  (10) 

[xn]i(N)  =  [xn]i(N-  1),  Vn<N-  1. 

The  state  estimates  are  given  by 

[xn]i(N)  =  min  { &(nTu)  -  lt,  max  {&( nTu )  -  sh  [xn]i(N) }  j ,  Vn.  (11) 

The  operation  in  (11)  maintains  the  bounds  on  the  noise,  and  it  makes  sure  that  the  hlter 
saturates  if  it  tries  to  exceed  the  bounds  of  the  noise.  This  filtering  is  well-defined  because 
of  the  piecewise  continuity  of  the  control  input  u(t),  and  it  is  consistent  in  a  pointwise  sense, 
as  the  following  theorem  shows. 

Theorem  2  (Ruppert  and  Wand,  1994).  IfTu  is  fixed,  r  is  the  polynomial  order  of  the  filter, 
and  k  — y  oo  :  kTs  =  Tu;  then,  the  filter  defined  in  (10)-(11)  is  consistent:  \\xn  —  xn(AT)||  = 

Op(/Hr+1)/(2r+3)). 

Proof.  Strictly  speaking,  the  result  in  [12]  applies  to  the  hlter  defined  in  (9)-(10).  Consistency 
with  respect  to  (11)  is  established  by  noting  that  the  bounds  on  the  noise  imply  that  \\xn  — 

£n(A0||  <  ||xn  -  xn(AT)||.  □ 

Remark.  Because  k  =  Tu/Ts,  this  theorem  intuitively  says  that  the  hlter  performs  well  as 
long  as  Ts  is  much  smaller  than  Tu. 

We  also  have  the  following  lemma  which  discusses  the  finite-sample  properties  of  (11). 
The  intuition  is  that  if  the  measurement  noise  is  bounded  and  all  states  are  measured, 
then  the  hlter  preserves  the  property  that  the  state  estimates  remain  within  a  bounded 
distance  of  the  true  states.  Note  that  the  Minkowski  sum  [14]  of  two  sets  U ,  V  is  defined  as 
U  (B  V  =  {u  +  v  :  u  e  U;  v  e  V}. 

Lemma  2.  Under  the  assumptions  delineated  above,  we  have  that  x[n]  G  x[n]  ©  S,  where 
£  =  {e  :  lj  <  [e]j  <  ©  (— {e  :  lj  <  [e]j  <  Sj } ) . 

Proof.  Note  that  (11)  enforces  that  fj  —  Sj  <  x3  <  —  l3,  which  can  be  rewritten  as 

x  G  f  ©  (— {e  :  l3  <  [e]3  <  Sj}).  The  bounds  on  the  noise  xt  +  <  X{  +  s*  are  equivalent 

to  having  £  G  x  ©  {e  :  l3  <  [e]j  <  Sj}.  The  result  follows  from  properties  of  ©,  ©  [14]-  □ 

3.2  Filter  Implementation 

Because  the  hlter  is  simply  a  weighted  sum  of  measurements  (10),  the  largest  difficulty  with 
implementation  is  in  computing  the  hlter  coefficients  (9).  The  hrst  step  in  doing  this  is  to 
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choose  the  order  of  the  filter.  Empirical  results  show  that  linear  (r  =  1)  or  quadratic  (r  =  2) 
LPR  typically  gives  good  results.  For  clarity  of  presentation,  we  focus  here  on  the  case  of 
r  =  1. 

Having  chosen  the  order  of  the  filter,  the  next  step  is  to  compute  the  bandwidth  parame¬ 
ters  h\.nj,  hp-n  {.  To  make  the  notation  compact,  let  ?  be  a  blank  spot  that  is  either  replaced 
with  ?  =  p  or  ?  =  A.  Using  results  from  [5],  it  can  be  shown  that  the  optimal  bandwidths 
for  r  —  1  are  approximately  given  by 


f  acr2Tu  \ 1/5 
\2  Xi(nTl)k) 


(12) 


and  the  second  time-derivative  XifnTf)  is  the  left-sided  derivative  if  ?  =  A  (or  right-sided 
derivative  if  ?  =  p).  Unsimplified  expressions  for  the  cases  r  >  1  can  be  found  in  [5].  We 
can  approximate  the  values  of  these  second  time-derivatives  by  using  (1).  More  specifi¬ 
cally,  the  estimated  values  are  given  by  Xi(nT£)  =  [A2f(nTu)  +  AcBcun]i  and  x^riTf)  = 
[A2c£(nTu)  +  AcBcun- 1].. 

Because  it  is  time  consuming  to  compute  the  filter  coefficients  (9),  we  suggest  an  imple¬ 
mentation  in  which  they  are  precomputed.  Define  a  set  TL  =  {hi, ... ,  hmax},  and  compute 
the  filter  coefficients  for  each  value  in  Pi.  Then,  when  we  would  like  to  filter,  we  estimate 
the  time  derivatives  XifnTff)  and  Xi(nT*),  and  use  these  to  compute  h?;n,i-  The  closest  value 
in  Pi  is  selected,  and  the  corresponding  set  of  precomputed  filter  coefficients  are  used  to  do 
the  filtering  as  defined  in  (10)-(11). 


4  Epi-convergence  Proofs 

We  provide  proofs  of  the  theorems  regarding  convergence  of  the  control  law  of  LBMPC  to  an 
MPC  that  knows  the  unmodeled  dynamics,  for  both  the  case  where  the  oracle  is  parametric 
and  the  case  where  the  oracle  is  the  nonparametric  oracle  that  we  defined  and  call  the 
L2-regularized  Nadaraya- Watson  (L2NW)  estimator.  The  key  for  these  results  is  that  the 
system  trajectory  must  have  a  property  called  sufficient  excitation  (SE),  which  intuitively 
means  that  all  modes  of  the  system  are  perturbed  so  that  they  can  be  identified.  The  theorem 
on  convergence  is  trivial  in  parametric  case,  because  it  results  from  combining  two  existing 
theorems  that  are  valid  under  SE.  The  proof  for  the  L2NW  case  is  more  involved,  since  it 
requires  showing  epi-convergence  of  the  L2NW  estimator  under  the  notion  of  SE. 

4.1  Parametric  Oracle 

Proof  of  Theorem  f  in  [3].  The  proof  simply  requires  application  of  existing  theorems.  If  6m 
converges  in  probability  to  6q,  then  the  result  is  true  by  Proposition  2.1  of  [17].  The  required 
convergence  in  probability  occurs  under  SE  [7,  6,  9],  and  so  the  result  trivially  follows.  □ 

Remark.  The  situation  in  which  the  states  are  measured  with  noise  requires  the  use  of  the 
continuous  mapping  theorem  [15]  taken  in  conjunction  with  Theorem  2.  For  the  case  where 
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the  parameters  enter  linearly,  the  hypothesis  of  the  continuous  mapping  theorem  is  satisfied 
because  Xm  =  (. X'X)~1X'Y  is  continuous  with  respect  to  X  given  SE  [7].  For  the  nonlinear 
case,  we  need  to  explicitly  assume  that  Xm  is  continuous  with  respect  to  X  in  order  to  be 
able  to  apply  the  continuous  mapping  theorem. 


4.2  Nonparametric  Oracle 

Showing  that  the  L2NW  estimator  leads  to  convergence  of  the  control  law  of  LBMPC  under 
an  assumption  of  SE  requires  additional  work.  For  ease  of  reference,  we  give  one  expression 
of  the  L2NW  estimator  defined  in  [3].  Let  Xi  =  [x't  u'ff ,  Yt  =  xi+i  —  ( Axi  +  Bui),  and 
Sj  =  |£  —  Xi\\2/h2,  where  Xt  £  Mp+m  and  Yl  £  M.p  are  data  and  £  =  [x1  u'] '  are  free  variables. 
We  define  any  function  k  :  R  — >  M+  to  be  a  kernel  function  if  it  has  (a)  finite  support 
k{v)  =  0  for  \v\  >  1,  (b)  even  symmetry  k(u)  =  k(— v),  (c)  positive  values  k(u)  >  0  for 
\v\  <  1,  (d)  differentiability  (i.e.,  the  derivative  dn  exists),  and  (e)  nonincreasing  values  of 
k{v)  over  v  >  0.  The  L2-regularized  NW  (L2NW)  estimator  is  dehned  as 


Cm(x,  Uj  X^  Yf) 


^  K(^i)  ’ 


(13) 


where  A  G  M+.  If  A  =  0,  then  (13)  is  simply  the  NW  estimator.  The  A  term  acts  to  regularize 
the  problem  and  ensures  differentiability. 

We  begin  by  proving  a  uniform  version  of  a  theorem  that  is  called  either  the  continuous 
mapping  theorem  [15]  or  Slutsky’s  theorem  [4],  depending  on  the  author. 

Lemma  3.  Given  random  variables  14,  V  £  V,  for  all  k  £  Z+,  such  that  j|14  —  V\\  = 
Op(rk);  if  L(x,v)  :  X  x  V  — *  M  is  a  continuous  function  and  X,  V  are  compact  sets,  then 

slTW  \L(X,  vk )  -  L(x,  V)  \  =  Op(rk). 

Proof.  The  Heine-Cantor  theorem  (Theorem  4.19  in  [11])  gives  uniform  continuity  of  L(x,v) 
on  Lx  V,  and  this  implies  that  for  all  x,  \\Vk  —  V\\  >  5  >  0  whenever  \L(x,  Vk)  —  L(x,  V)\  > 
e  >  0.  Proceeding  analogously  to  [15],  we  have 


P(sup  \L(x,  Vk)  —  L(x,  V)\  >  e)  =  P(3x  :  \L(x,  Vk)  —  L(x,  E)|  >  e)  <  P(||14  —  V\\  >5).  (14) 

X 

The  result  is  immediate.  □ 

We  can  now  show  the  first  convergence  result  for  the  L2NW  estimator.  Let  X%,  Yt  be 
dehned  in  the  same  way  as  Xt,  Yt,  with  the  change  that  Xt,  Y. \  are  dehned  using  state  estimates 
x  instead  of  noiseless  measurements  of  the  state  x.  The  intuition  of  why  this  result  is  true 
is  that  though  noise  in  Y)  and  X,  is  correlated,  our  filtering  dehned  in  Section  3  makes  this 
correlation  asymptotically  insignificant.  This  result  can  be  interpreted  in  an  instrumental 
variables  context  [1,  8]. 

Corollary  1.  If\\Xi-Xi\\  =Op{rk),  then  supxeA,  ueW  \\Om(x,  u;  Xu  Yf)  -  Om(x,  u;  Xh  Y4) ||  = 
Op{rk) . 
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Proof.  Define  a  random  variable  14  =  \X[  ■  ■  ■  X'N  Y{  . . .  Yf]  ,  and  let  V  be  the  cor¬ 
responding  limiting  vector.  The  definition  of  Yt  and  the  corollary’s  assumption  imply  that 
\\Yi  -  Yi\\  =  Op(rk),  and  so  \\Vk  -  V\\  =  Op(Nrk). 

Now  consider  the  functions  rj  =  JA  YinfEi)/N,  5  =  ^ /c(Sj)/iV,  and  p  =  rj/(X/N  +  5). 
Applying  Lemma  3  gives,  sup^*^  || Vk)-r}(£-,  V)\\  =  Op(rk )  and  sup^*^  ||<S(f;  14)- 
<5(£;  l7)  ||  =  Op(rk).  Another  application  of  Lemma  3  gives  sup^  ^  ||p(£;  14)  -  p(£;  E)||  = 
Op(rk).  The  result  follows  by  noting  that  Om(x,u]  X^Yf)  =  p(£;  14)  and  Om(x,u\Xi,Yj)  = 
P&V).  "  ^  □ 

Remark.  The  variance  of  the  NW  estimator  in  its  typical  setup  is  known  to  uniformly  con¬ 
verge  at  a  rate  no  faster  than  n~4/-p+b  [12].  Our  result  gives  a  nonstandard  rate  of  conver¬ 
gence  rk,  because  we  have  a  time-series  setup  with  presmoothing  to  account  for  the  errors 
in  measurements. 

Convergence  of  an  estimator  is  often  studied  by  decomposing  the  estimation  error  into 
a  bias  and  variance  term.  For  proving  convergence  of  our  regularized  NW  estimator,  we 
have  to  be  careful  in  defining  the  probabilistic  framework  before  we  can  decompose  the  error 
into  two  terms.  The  Xt  values  are  not  independent  variables  drawn  from  some  probability 
distribution.  They  are  exactly  the  states  of  a  deterministic  system,  as  it  evolves  in  time.  In 
fact,  if  the  control  inputs  un  are  (deterministically  or  statistically)  known  for  each  point  in 
time,  then  Xt  and  X3  are  dependent  for  all  values  where  i  >  j. 

For  a  nonlinear  system,  SE  is  usually  defined  using  ergodicity  or  mixing,  but  this  is 
hard  to  verify  in  general.  Instead,  we  define  SE  as  a  finite  sample  cover  (FSC)  of  X.  Let 
Bh(x)  =  {y  :  ||a;  —  y\\  <  h}  be  a  ball  centered  at  x  with  radius  h,  then  a  FSC  of  A  is  a  set 
Sh  =  \JiBh/2{Xi)  that  satisfies  X  C  Sk.  The  intuition  is  that  {A*}  sample  X  with  average, 
inter-sample  distance  less  than  h/ 2.  Assuming  SE  in  the  form  of  a  FSC  with  asymptotically 
decreasing  radius  h,  we  can  show  that  the  control  law  of  LBMPC  that  uses  L2NW  converges 
to  that  of  an  MPC  that  knows  the  true  dynamics. 

Recall  that  g(x,u )  is  the  modeling  error  of  the  approximate  linear  system  defined  in 
[3].  We  have  the  following  result,  which  shows  that  the  L2NW  estimator  with  noiseless 
measurements  and  SE  can  approximate  the  unmodeled  dynamics  arbitrarily  well. 

Lemma  4.  If  g(x,  u )  is  Lipschitz  with  constant  L  and  Sh  is  a  finite  sample  cover  of  Z  C  X  x 
U,  then  swp/xu\eZ  \\g(x,u)  —  Om(x,ir,  Xi,Yi)\\  <  p,Mg  + (l  —  g)Lh,  where  p  =  A/(A  +  k(1/2)) 
mid  Mg  =  max  ||a;||  :  x  G  X . 

Proof.  Define  /  =  {i  :  Sj  <  1},  and  note  that  /c(Hj)  =  0  for  j  ^  I.  An  alternative 
characteristic  of  the  L2NW  estimator  is  as  the  positively  weighted  average:  Orn(x,  u;  Xi:  Yt)  = 
wo  •  0  +  J2ieiWi  ■  Yi ,  where  wQ,Wi  >  0,  =  /c(Sj)/(A  +  Jf  j  K(sj))>  and  w0  +  Jf  iwi  =  L 

The  finite  sample  cover  property  of  Sk  implies  that  JA  K,(Sj)  >  k(1/2).  Noting  w0  < 
A/(A  +  k(  1/2))  and  Yt  =  g(Xi),  the  result  follows  from  the  triangle  inequality.  □ 

Remark.  The  result  shows  that  the  regularized  NW  estimator  in  our  setup  has  bias  0(X  +  h), 
where  A  =  0(h).  This  matches  the  bias  of  the  NW  estimator  0(h)  in  a  standard  setup  at 
both  interior  and  boundary  points  [10,  12]. 


Theorem  3.  Assuming  Shn  is  a  finite  sample  cover  of  Z  C  X  xW,  for  some  sequence 
hn  — >  0;  A  =  0(hn);  and  k  is  a  sequence  such  that  Tu/k  — >  0  (see  Theorem  2);  then  the 
regularized  NW  estimator  is  uniformly  consistent  on  Z  and  converges  at  rate 

sup  \\g(x,  u)-Om{x,u\XuYd  ||  =  0{\  +  hn)  +  Op(fc"(r+1)/(2r+3)).  (15) 

(x,u)£Z 

Proof.  Using  the  triangle  inequality,  the  left-hand  side  of  (15)  is  bounded  by 

sup  \\Om(x,u;Xi,Yi)  -  Om(x,u;Xi,Yi)\\  +  sup  \\g(x,u)  -  Om(x,u;  Xh  U:)||  (16) 

(x,u)EZ  (x,u)EZ 

This  hrst  term  is  controlled  by  Corollary  1  and  Theorem  2,  and  the  second  is  governed  by 
Lemma  4.  □ 

Remark.  Ideally,  we  would  like  Z  =  X  xU ,  but  this  does  not  always  happen.  It  requires  that 
the  trajectory  of  the  system  sufficiently  explores  the  space  in  a  manner  formalized  by  the 
definition  of  finite  sample  cover.  A  set  Z  which  meets  the  assumptions  of  Theorem  3  always 
exists,  and  this  can  be  shown  by  construction:  Given  any  n  >  0,  let  Z  =  U”=1  A,.  A  better 
set  Z  is  defined  as  the  limit  of  a  convergent  subsequence  of  A*,  and  its  advantage  is  that 
the  A i  visit  a  neighborhood  of  the  limit  infinitely  often.  Such  a  limit  is  guaranteed  to  exist 
by  the  Bolzano- Weierstrass  theorem.  These  two  constructions  mean  that  there  is  always 
some  set  on  which  the  nonlinear  dynamics  g(x,u)  can  be  learned,  and  this  set  corresponds 
to  points  which  the  trajectory  visits. 

We  need  the  following  theorem  in  order  to  show  epi-convergence  of  as  defined  in  [3] , 
for  the  LBMPC  problem  that  uses  the  L2NW  estimator  (13)  as  the  oracle. 

Theorem  4.  Let  XV,XW,1Z  C  Mn  be  closed  and  compact  sets,  and  assume  that  we  have  a 
sequence  of  functions  14  (x)  '■  Xv  —> >  %w  and  Wk(x)  :  Xw  — y  7Z  which  converge  in  probability  to 
V{x),W{x)  as  supxeXv  \\Vk(x)  -  U(x)||  =  Op(rk )  and  sup*^  \\Wk(x)  -W(x)\\  =  Op(sk).  If 
W  is  Lipschitz  continuous  with  constant  Lw,  then  supxeXv  \\Wk(yk(x))  —  W(y(x))\\  =  Op(ck), 
where  ck  =  max{rfc,  sk}. 

Proof.  Applying  the  triangle  inequality  gives 

P(supieA.. \Wk(Vt(x))-W(V(x))\/ct  >  e)  <  P(supI&r„ \Wk(Vk(x))-W(Vk(x))\/ck  >  e)  + 

P(suPl«„ \W(Vk(x))  -  W(V(x))\/ck  >  «).  (17) 

The  first  term  on  the  right  in  (17)  can  be  bounded  as 

p(supie*„  | Wk(Vk(x))  -  W( Vk(x))\/ck  >  e)  <  rfiup^  | Wk(x)  -  W(x)\/ck  >  e),  (18) 

and  so  the  limit  of  (18)  by  assumption  is  lim P(supa.eA-  \Wk(Vk(x))  —  W(Vk(x))\/ck  >  e)  =  0. 
The  second  term  on  the  right  in  (17)  is  bounded  using  the  Lipschitz  constant  as 

r(supieA..  | W(Vk(x))  -  W(V(x))\/ck  >  e)  <  P(supI£^  Lw \Vt(x)  -  V(x)\/ck  >  e),  (19) 

and  taking  its  limit  gives  by  assumption  that  lim P(supxeA.(  \W(Vk{x))  —W{V(x))\/ck  >  e)  — 
0.  The  result  follows  by  taking  the  limit  of  (17)  and  observing  that  the  limit  is  equal  to 
zero.  □ 
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Remark.  The  theorem  shows  that  convergence  in  probability  is  preserved  under  composition, 
but  the  one  subtlety  in  the  result  and  subsequent  proof  is  the  issue  of  domains  of  convergence. 
We  are  composing  two  functions  W4(14(x)),  and  convergence  occurs  as  long  as  the  range  of 
the  function  on  the  inside  14(-)  lies  within  the  domain  of  convergence  of  the  function  on  the 
outside  Wf..  ( • ) .  By  Theorem  5  of  [3],  the  L2NW  estimator  has  the  required  range. 

With  the  previous  theorem,  we  can  now  show  that  if m  epi- converges.  Consequently,  the 
control  law  of  LBMPC  with  L2NW  as  oracle  converges  to  that  of  MPC  that  knows  the 
unmodeled  dynamics,  when  there  is  SE  as  defined  by  the  appropriate  FSC. 

Proof  of  Theorem  6  in  [3].  Note  that  equality  constraint  in  LBMPC 

Xn+ 1  Axn  T  Bun  T  (D m[xn)  Un)  (20) 

recursively  defines  xm+i,  for  i  —  {0, . . . ,  N},  as  functions  of  only  xm  and  c.  For  example,  the 
equation  for  xm+i  is  given  by 

c,  (Dm)  A  xm  T  AB{Kxm  T  cm)  T  AOm(xm,  Kxni  T  cm) 

T  B(K (Axm  T  B(Is.xm  T  cm)  +  (Dm[fi im,  Kxm  T  cm))  T 
T  Om(Axm  T  B(Kxm  cm)  T  (Dm(xm,  Kxm  T  cm),  A  ( Axm 

T  B(Kxm  T  cm)  T  ( 0m{xm ,  Kxm  T  cm))  T  cm+i)-  (21) 

By  Theorem  4  we  have  that  supXm.0(Xm)^o  \\xm+i(xm,  Om)  -  xm+i(xm,g)  ||  =  Op(rm),  where 
rm  is  the  convergence  rate  from  Theorem  3.  Since  if  is  continuous,  we  can  compose  it  with 
x[m  +  i]  using  Theorem  4.  This  gives  that  supXm;^(Xm)_^0  \\ipm  -  f)0\\  =  Op(rm). 

The  last  step  requires  showing  that  this  condition  is  equivalent  to  lower  semicontinuity 
in  probability.  Because  if0  is  continuous,  given  e  >  0  and  a  point  x0,  c,  there  exists  a 
neighborhood  U{xq,c}  such  that 

|^o(C)  -4>o(xo,  c)  |  <  e/2,  (22) 

for  all  C  G  U{xq,c}.  Now  consider  the  expression 

a  =  P(^infCeC7{x0)C}  ipm  <  ipo(xo,  c)  -  ej  <  p(  supC6[/{a.o  c}  | -  f>o(xo,  c)|  >  e) .  (23) 

Using  (22),  we  can  further  bound  the  expression  above  by 

a  <  p(supce[/{x0ic[.]}  I ijm  -  ifo\  >  e/2).  (24) 

Taking  the  limit,  we  have  that  lima  =  0,  and  so  the  result  follows  by  applying  Proposition 
5.1  of  [16].  □ 
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